13 research outputs found

    Toward an Understanding of Software Code Cloning as a Development Practice

    Get PDF
    Code cloning is the practice of duplicating existing source code for use elsewhere within a software system. Within the research community, conventional wisdom has asserted that code cloning is generally a bad practice, and that code clones should be removed or refactored where possible. While there is significant anecdotal evidence that code cloning can lead to a variety of maintenance headaches --- such as code bloat, duplication of bugs, and inconsistent bug fixing --- there has been little empirical study on the frequency, severity, and costs of code cloning with respect to software maintenance. This dissertation seeks to improve our understanding of code cloning as a common development practice through the study of several widely adopted, medium-sized open source software systems. We have explored the motivations behind the use of code cloning as a development practice by addressing several fundamental questions: For what reasons do developers choose to clone code? Are there distinct identifiable patterns of cloning? What are the possible short- and long-term term risks of cloning? What management strategies are appropriate for the maintenance and evolution of clones? When is the ``cure'' (refactoring) likely to cause more harm than the ``disease'' (cloning)? There are three major research contributions of this dissertation. First, we propose a set of requirements for an effective clone analysis tool based on our experiences in clone analysis of large software systems. These requirements are demonstrated in an example implementation which we used to perform the case studies prior to and included in this thesis. Second, we present an annotated catalogue of common code cloning patterns that we observed in our studies. Third, we present an empirical study of the relative frequencies and likely harmfulness of instances of these cloning patterns as observed in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In summary, it appears that code cloning is often used as a principled engineering technique for a variety of reasons, and that as many as 71% of the clones in our study could be considered to have a positive impact on the maintainability of the software system. These results suggest that the conventional wisdom that code clones are generally harmful to the quality of a software system has been proven wrong

    Subjectivity in Clone Judgment: Can We Ever Agree?

    Get PDF
    An objective definition of what a code clone is currently eludes the field. A small study was performed at an international workshop to elicit judgments and discussions from world experts regarding what characteristics define a code clone. Less than half of the clone candidates judged had 80% agreement amongst the judges. Judges appeared to differ primarily in their criteria for judgment rather than their interpretation of the clone candidates. In subsequent open discussion the judges provided several reasons for their judgments. The study casts additional doubt on the reliability of experimental results in the field when the full criterion for clone judgment is not spelled out

    Requirements specifications and recovered architectures as grounded theories

    Get PDF
    This paper describes the classic grounded theory (GT) process as a method to discover GTs to be subjected to later empirical validation. The paper shows that a well conducted instance of requirements engineering or of architecture recovery resembles an instance of the GT process for the purpose of discovering the requirements specification or recovered architecture artifact that the requirements engineering or architecture recovery produces. Therefore, this artifact resembles a GT

    Toward a Taxonomy of Clones in Source Code: A Case Study

    No full text
    Code cloning --- that is, the gratuitous duplication of source code within a software system --- is an endemic problem in large, industrial systems [9, 7]. While there has been much research into techniques for clone detection and analysis, there has been relatively little empirical study on characterizing how, where, and why clones occur in industrial software systems. In this paper, we present a preliminary categorization scheme for code clones, and we discuss how we have applied this taxonomy in a case study performed on the file system subsystem of the Linux operating system. Our case study yielded many surprising results, including that cloning is rampant both within particular file system implementations and across different ones, and that as many as 13% of the 4407 functions that are more than six lines long were involved in a clone-pair relationship

    Aiding Comprehension of Cloning Through Categorization

    No full text
    Management of duplicated code in software systems is important in ensuring its graceful evolution. Commonly clone detection tools return large numbers of detected clones with little or no information about them, making clone management impractical and unscalable. We have used a taxonomy of clones to augment current clone detection tools in order to increase the user comprehension of duplication of code within software systems and filter false positives from the clone set. We support our arguments by means of 2 case studies, where we found that as much as 53% of clones can be grouped to form Function clones or Partial Function clones and we were able to filter out as many as 65% of clones as false positives from the reported clone pairs

    Clone Detection: How accurate is your data set?

    No full text
    Duplication of code in software systems is considered to be a serious problem that can affect a systems maintainability and extendability. It is reported that 10-15% of code in a software system is involved in cloning. However, because of the difficultly of objectively measuring the number of false positives in a clone result set, the accuracy of these reports is difficult to evaluate. Although an important topic, little work has been done in the area of evaluating the accuracy of clone detection methods. In this paper we propose a study to estimate the number of false positives that are likely to be in a data set in an objective way by measuring the number of clones found in a large body of unrelated code. We also propose a method to measure the impact of external factors such as programing idioms and API protocols on the detected results set. The results of this work will provide tools and knowledge to better evaluate the current state of the art of clone detection research

    Noname manuscript No. (will be inserted by the editor)

    No full text
    the date of receipt and acceptance should be inserted later Abstract Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system’s design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71 % of the clones could be considered to have a positive impact on the maintainability of the software system.

    Cloning by Accident: An Empirical Study of Source Code Cloning across Software Systems

    No full text
    One of the key goals of open source development is the sharing of knowledge, experience, and solutions that pertain to a software system and its problem domain. Source code cloning is one way in which expertise can be reused across systems; cloning is known to have been used in several open source projects, such as the SCSI drivers of the Linux kernel [16] . In this paper, we discuss two case studies in which we performed clone detection and analysis on several open source systems within the same domain: we examined nine text editors written in C, and eight X-Windows window managers written in C and C++. To our surprise, we found little evidence of "true" cloning activity, but we did notice a significant number of "accidental" clones --- that is, code fragments that are similar due to the precise protocols they must use when interacting with a given API or set of libraries. We further discuss the nature of "true" versus "accidental" clones, as well as the details of our case studies

    Four Interesting Ways in Which History Can Teach Us About Software

    No full text
    In this position paper, we outline four kinds of studies that we have undertaken in trying to understand various aspects of a software system's evolutionary history. In each instance, the studies have involved detailed examination of real software systems based on "facts" extracted from various kinds of source artifact repositories, as well as the development of accompanying tools to aid in the extraction, abstraction, and comprehension processes. We briefly discuss the goals, results, and methodology of each approach
    corecore